DATA AND RESULTS VISUALIZATION PROJECT: THE IMPACT OF COVID-19 IN MUSIC INDUSTRY

Italy streams trend analysis

The first step in our analysis consists of an accurate examination of the platform streams in Italy using the Spotify top 200 weekly. The analysis is needed to gain insight into the events that cause the number of streams to increase or decrease. At first, we will plot the number of streams for the different years: 2017, 2018, 2019, and 2020.
We wish to identify patterns and interesting trends, such as a decrease in the number of streams due to the spread of the pandemic, or event-related peaks.

The first dataframe extracted from our dataset and taken into consideration contains four attributes:

  1. avg_streams : is the average number of streams computed on the weekly top 200, obtained by summing the number of streams of each track in the top 200 in each week and then divided by the number of tracks (200)
  2. date : represents the start of the week
  3. year
  4. number_of_weeks : Index of the week in a given year, since Spotify does not provide consistent week data.
    For different years, we could have the first week of the year starting from January 1st, while another year could start from January 3rd.
    The offset between weeks of different years is only a few days, so it does not impact our analysis.

The resulting dataframe looks as follows:

Plotting streams trend in Italy

The results below show that the beginning of the pandemic and the consequent lockdown has caused a significant decrease in the number of streams, compared to the same period in previous years.

We can also notice a constant increase in streams that seem to reach a plateau in 2020/2019. The number of streams during 2017/2018/2019 has a constant upward trend, which seems to settle down between 2020 and 2019. This trend is easily explainable, considering Spotify's growing popularity and the increasing number of total users on the platform.

We're inserting here an interactive plot to allow the user to zoom in interesting areas of the plot.

The following heatmaps show the distribution of the streamings for each year. Each row is a week of the year, while each column is the top 200 position. We limited our analysis to the first 50 and 15 songs for clarity. Plotting the whole top 200 resulted in an almost completely black heatmap

Musical Features Correlation

The following analysis aims to highlight emerging trends in popular music by analyzing the correlation of certain musical features, such as energy, acousticness, loudness, and speechiness, with the platform's number of streams. The high correlation between streams and speechiness is a clear indication that Hip-Hop/Rap/Trap tracks are dominating the Spotify's top 200.

Analyzing Stream Peaks

There are some significant peaks in the avg_streams that we wish to analyze to unravel the causes of such spikes. Let us retrieve the top 200 of the weeks during which those peaks occur.

7-14 February 2020 and 8-15 February 2019 peaks

We retrieve the top 200 of these weeks and analyze it.

In the following plot we're going to analyze the stream peak in the 7-14 February 2020 as highlighted on the stream trend plot (the upper one).

The barchart shows the number of songs in the top 200 per artist, while the treemap shows the percentage of streams per artist.

The barchart shows a major number of tracks by Shiva. He released the Routine EP on the 31 January of 2020.
However, the barchart can be misleading.
There has indeed been a major release by Shiva (we will see later how releases impact streams), but in this case it's not the major reason for streams increment.

In fact, if we analyze the treemap, we can notice that there are many artists who have a good percentage of streams even if their number of tracks in the top 200 is less or equal to three.

Such artists are Diodato, Marracash, Fasma, Francesco Gabbani and Elettra Lamborghini.

The plots show also a major presence of ThaSupreme, his presence is given by it's major release in November 2019, which guaranteed him a long presence on the top charts (we will analyze his case later)

The following result should make ourselves ask: Why are there some artist like Diodato who have a quite high percentage of streams even if they have three or less tracks in the top 200?

As we might notice from the following top 10, the first top 8 songs come from Sanremo Festival, with Diodato in first position, which happens to be the winner of Sanremo 2020.

Now we're going to analyze the same week, but in 2019. We're analyzing the period 8-15 February 2019

The histogram on the left shows a major number of tracks by Madman. He released the MM Vol.3 Mixtape on the 1 February of 2019.
However, if we look at the histogram containing the percentage of streams, we can see that Madman is only in third position, with Ultimo and Mahmood in first and second position respectively.

Mahmood won the Sanremo Festival 2019 while Ultimo ended up taking the second place.

There has also been a lot of discussion and criticisms about Mahmood first place. In fact, the favourite winner by the public was Ultimo with the 48,80% of votes, while Mahmood got only the 20,95% (source: https://www.sorrisi.com/musica/sanremo/sanremo-2019-analisi-del-televoto-e-dati-delle-giuri/).

The technical jury decided to assign the first place to Mahmood, this generated a lot of criticisms and discussion on public media, which guaranteed Ultimo a huge visibility and streams on Spotify.

If we plot the top 200 weekly dataset, we can notice that, out of the top 10 songs, 6 of them are from Sanremo Festival.

Correlation between musical events and Spotify streams

It is indeed clearer how the streams spiking up in February in Italy are generally due to new releases and, more importantly, to Sanremo Festival. This analysis reveals the correlation between musical events (festivals, awards, etc.) and stream trends that will be further confirmed in subsequent analysis.

5 - 12 July 2019

We can notice a huge spike in the number of streams on this week.

We can notice there's a big number of track from MACHETE. In fact on the 5 july 2019 the MACHETE Mixtape 4 has been released, which had a huge success, with major collaborations between big rappers in the italian music scene.

Also, more than the 30% of the weekly streams are given by MACHETE.

We can also notice that almost the whole top 10 comes from that album.

15-22 November 2019

In the left histogram we can notice 20 songs from Tha Supreme. He released highly anticipated debut album 236451 on the 15 November 2019, which had a huge success.
Almost the 35% of the weekly top 200 streams has been generated by Tha Supreme followed by Marracash who release his album Persona the previous week.

As we can see, basically the whole top 10 is by Tha Supreme

9-16 november 2018

15 tracks by Salmo are present in the top 200. In fact his album Playlist has been released on the 9 November 2018

We can easily see that almost the whole top 10 is taken by Salmo.

19-26 January 2018

We can easily spot 18 tracks by Sfera Ebbasta. He released his album Rockstar on the 19 Janruary 2018, which has been a huge success in the italian trap scene.

The whole top 10 is taken by Sfera Ebbasta.

Releases and events impact number of streams

To conclude, we can observe a correlation between the number of streams on Spotify and major releases and musical events.
This correlation becomes more evident than before in the last two years, with Spotify becoming a widespread platform used daily by its users. In 2017 and the beginning of 2018, the service's popularity was still growing, as we can see by the constant increase in the number of streams.

Limited to Italy, most spikes were given by Italian Hip-Hop/Rap/Trap artists, as we have already anticipated with our correlation plot, which showed a high correlation between speechiness and streams.

These charts reported the analysis only of a limited number of peaks for sake of clarity. The analysis has been performed on all peaks in the plot.

We also identified a decreasing trend in the number of streams during quarantine/pandemic.
Considering the previous results, one should naturally ask himself: "Is this decreasing trend caused directly by the pandemic, or is it caused by the lack of new releases and musical events?"

We will analyze the number of releases in the pandemic period and compare it with the number of releases in the same period in previous years.

Analysis of new releases

Now we're going to analyze the number of new releases during the quarantine period related to Hip-Hop/Rap genre.

Hip-Hop/Rap releases during quarantine period

In the following table we have the total number of releases for each year in the period from the 6th of March to the 15th of May, which spans the entire lockdown period in Italy. (sources= http://www.salute.gov.it/portale/nuovocoronavirus/dettaglioNotizieNuovoCoronavirus.jsp?lingua=italiano&menu=notizie&p=dalministero&id=4184 and https://www.rainews.it/dl/rainews/articoli/coronavirus-Fase-2-ecco-come-si-riparte-il-18-maggio-6a0c52c3-2f99-4170-95de-e6d221e7ff21.html )

The following plot shows the number of new releases in the period from 03-06 to 05-15 for each year.

As we can see in the plot, we have an increasing trend in the number of releases in the different years which drops in 2020.

Probably no major italian Hip-Hop/Rap releases

By looking at the plot above, we can notice a drastic decrease in the number of releases in 2020. As discussed previously, most of the significant peaks in the number of streams are related to Italian rappers' influential album releases.

We first build a dataframe which contains the number of new releases per artist in the period from 03-06 to 05-15 for each year. For example we will end up with something like:

| Artist | Year | New | | Queen | 1972 | 15 | | Queen | 1980 | 12 | | Michael Jackson | 1979 | 10 |

This means that Queen released 15 new tracks in the period from 03-06-1972 to 05-15-1972, they also released 12 new tracks in the period from 03-06-1980 to 05-15-1980 and Michael Jackson released 10 new tracks in the period 03-06-1979 to 05-15-1979 and so on.

As we can see, the number of Italian releases in the period considered, in 2017, 2018, and 2019 are quite a lot if compared with 2020, which had only a release by Nitro right at the beginning of the quarantine period: he released his album GarbAge on the 6th March 2020.

The most acute observer could notice something peculiar: we have a decent number of new releases from Ghali and Marracash in the period from 03-06-2020 to 05-15-2020, but Ghali released his album DNA on the 21st of February 2020 while Marracash released Persona on the 31st October 2019.

Someone might wonder why there are releases by artists that do not correspond to the release of new albums and are neither new singles. We will analyze this particular situation in the following.

Analyzing Marracash anomaly

If we analyze the file it_2020-03-27--2020-04-03, we can notice 6 new releases by Marracash!

We retrieve also the top 200 corresponding to the week of the release of the album, which is the 1st November 2019. In the second dataframe, we can notice 15 tracks from Marracash in the top 200, which basically are all the tracks from the album Persona.

The tracks are the following:

  1. Body Parts - I denti
  2. Qualcosa in cui credere - Lo scheletro (feat. Gué Pequeno)
  3. Quelli che non pensano - Il cervello (feat. Coez)
  4. Appartengo - Il sangue (feat. Massimo Pericolo)
  5. Poco di buono - Il fegato
  6. Bravi a cadere - I polmoni
  7. Non sono Marra - La pelle (feat. Mahmood)
  8. Supreme - L'ego (feat. Tha Supreme, Sfera Ebbasta)
  9. Sport - I muscoli (feat. Luchè)
  10. Da buttare - Il ca**o
  11. Crudelia - I nervi
  12. G.O.A.T - Il cuore
  13. Madame - L'anima (feat. Madame)
  14. Tutto questo niente - Gli occhi
  15. Greta Thunberg - Lo stomaco (feat. Cosmo)

We can clearly notice that all the tracks contained in the first dataframe, except for SPORT + muscoli (RMX) are also present in the second dataframe, but they have different release dates and different IDs!

Why? We're going to discover this later.

First thing first, a track which is present in both dataframes and check if it's actually the same doing a query to the Spotify API. We're going to use SUPREME - L'Ego

We can notice that the album is the same, and also all the artist infos are exactly the same! But there are 2 major differences:

  1. Release date: the considered song in the first dataframe has release date 2020-03-27 while in the second dataframe has release date 2019-10-31 which is coherent with the offical release date of the album Persona
  2. Number of tracks: the considered album has 2 different number of tracks! 17 if we consider the ID song from the first dataframe and 15 if we consider the second dataframe.

From the previous points and dataframes, we can conclude that 2 tracks have been added later which are SPORT + muscoli (RMX) and NEON - Le Ali, which are present in the first dataframe, but not in the official track list of the album Persona.

We discovered that adding extra tracks such as remixes, collaboration, featurings ecc... to an album, will automatically update the release date of all the tracks in the album!

In fact, if you visit the following link which contains the album Persona: https://open.spotify.com/album/19iZTn6IM82raMquk5Z7Ul where you can see the release date is 2019

While if you visit this link: https://open.spotify.com/album/3ZOt77e63uMgJXU7xcFpqu which contains the same album, but the release date is 2020.

It is worth noticing that the same track from the 2 album versions has the same identical number of streams, this could give some hints about spotify music storage and format. It seems that the track is actually the same, but it has different ID and release date, and it's present in two different versions of the same album.

The track we're using as test is SUPREME - L'ego.

Going back to COVID releases analysis

The previous considerations hold also for Ghali. He added two new songs to his DNA album released on the 21st February 2020, which are Cacao and Hasta la vista.

Everything we said so far gives us two important hints:

  1. We noticed that new releases inside an already existing album refresh the release date of the whole album (and also the tracks).
  2. Our estimate of the number of releases per period is in general an upward estimate of the real number of releases

Said that, Ghali and Marracash released only 2 and 3 new tracks in that period, which are remixes and/or featuring.

The only major album release by an italian artist have been GarbAge by Nitro. We can conclude that the pandemic had a relevant impact on the work of artists, as we had already thought, and the lack of new releases impacted on the average number of streams on Spotify.

Remarks: Our conclusion are based on the top 200 weekly, thus we can't conclude that the whole Spotify platform had a lower amount of active users/streams, and also that the overall number of releases by artists has decreased. We can only conclude that in the mainstream/commercial scene, there haven't been new releases by famous artists.
It could be interesting to filter and analyze only italian releases and compare their impact on the Spotify charts with respect to international releases, but unfortunately Spotify doesn't give any information about the language or release nation of the songs.

In could be also interesting to analyze and further investigate if there have been significant changes overall on the platform, if there have been a major change in the listened genres and so on but, again, we don't have access to the needed data.

The following article shows how people were more keen to listen to lo-fi chill music during the quarantine if compared to the "normality": https://blog.chartmetric.com/covid-19-effect-on-the-global-music-business-part-1-genre/

Unfortunately we can't show/prove these changes since our data is limited to the top 200, mainstream music hasn't seen major changes in the most listened genres.

Comparing streams trend for different nations

In the following cells we're going to plot the streams trend for different nations worldwide. We decided to restrict our analysis only to 6 nations due to the high amount of time needed to download the data from the internet (due to rate download limitations of the provided APIs).

In particular we focused on major EU countries + USA and Brazil, focusing on countries that faced lockdown measures against COVID.

Data Normalization

We applied a Z-Score normalization to our data for each nation. Data among nations was heterogeneous, compare streams on an absolute level generated untreatable plots: USA had an average number of streams 10 times higher than other countries.

Therefore we normalize our data to make it comparable.

Stream trend for each nation.

The plots show a common decreasing trend around 15 weeks among all states with an increasing trend towards summer.

Features Analysis in Italy

The following charts show Spotify's musical features over time and the pandemic's influence on them. As previously mentioned, those results refer only to the top 50 because many song remained in the top 200 for many weeks, so that no absolute conclusions concerning global music trends can be drawn. However, we could still infer some critical information on the most popular songs and see if there has been any shift in the users' preferences in the period under study.

Features distribution

Features distributions don't change significantly across years.
The only relevant change is in speechiness: it has a higher maximum value in 2020 if compared to previous years.

Valence

As defined by Spotify, Valence measures the musical positiveness conveyed by a track, using a score between 0.0 and 1.0.
This parameter undergoes some fluctuations over the year and shows some periodicity. In particular, one can notice that every year the average valence increases around Christmas time.
The steep increase during the lockdown period seems to contrast the periodicity and could be attributed to the tendency to look for comforting and cheerful music in that period of social isolation. However, considering the significant variance of this feature, combined with the stability of the top 50 in terms of musical genres, may indicate that this abrupt growth happened by chance.

Danceability

Danceability describes how suitable a track is for dancing based on a combination of musical elements, including tempo, rhythm, and overall regularity. Danceability increased in 2018 and since then has been almost constant, without showing particularly relevant variations even during the lockdown period.

Energy

Energy measures the overall intensity and activity of a track, as a function of its speed, loudness, and noisiness.

 Acousticness

Acousticness, as the name suggests, measures the presence in a track of acoustic intstruments.
It is possible to observe a significant drop during the lockdown period. Nevertheless, considering the complete histoty of this feature, this trend shows some periodicity, indicating that it may not be due to the pandemic.

Speechiness

Speechiness measures the presence of spoken words in a track. The plot shows no particular COVID-related trends, but a steady increase over the past four years, indicating the growing popularity of rap/hip-hop music, that dominates the top 200, as shown previously.

 Radar Chart

This radar chart is intended to show the comparison between features in different periods of time. It's an interactive plot to allow the user to compare and see the difference of all the features in many period of time.

Highlights

Our analysis showed that, limited to the data available to us, COVID 19 could have had an impact on music streamings.
From the very first plot we noticed a decrease in the number of streams during the quarantine period in Italy.

We also noticed a strong correlation between the number of streams and the speechiness feature, thanks to our domain knowledge (being italian and musicians) we supposed that Hip-Hop, Rap and Trap song had a major impact on Spotify streams trend.

In order to confirm this hypothesis, we decided to analyze the major peaks in the streams trend plot. Our analysis showed that we were right: major peaks in streams corresponded to musical events (Sanremo festival) and Rap, Hip-Hop and Trap releases.

As a consequence supposed that the decreasing number of streams could be caused of a smaller number of Rap, Hip-Hop and Trap releases. We filtered our dataset selecting only the top200 corresponding to the quarantine period (from March to May) for all years and we plotted the number of new Hip-Hop/Rap releases.

We noticed that in 2020 only Nitro released a new album which didn't impact positively the streams trend: we suppose that this has been caused by the coincidence of the release date of the album with the lockdown announcement by the italian government, people's attention has been caught by the news.

As we already discussed in our report, new releases by Ghali and Marracash are actually fake data: the addition of late remixes to the album, refreshes the release date of all the songs contained inside it.
The most famous songs of the album (which usually remain for a while in the top 200) are considered again as new releases, even if they're not.

Thus our hypothesis is not so wrong, probably the negative trend of streamings has been caused by a lack of new releases.

We also noticed no major changes in audio features, their distribution remained the same. This is kind of natural: mainstream music is dominated by Pop, Hip-Hop and Rap songs, thus analyzing only the top 200 will not show significant changes in features distribution.

However, our conclusion are limited only to the top200 and are not valid in general.